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Abstract. Log-concave distributions are an attractive choice for mod- 
eling and inference, for several reasons: The class of log-concave distri- 
butions contains most of the commonly used parametric distributions 
and thus is a rich and flexible nonparametric class of distributions. 
Further, the MLE exists and can be computed with readily available 
algorithms. Thus, no tuning parameter, such as a bandwidth, is neces- 
sary for estimation. Due to these attractive properties, there has been 
considerable recent research activity concerning the theory and appli- 
cations of log-concave distributions. This article gives a review of these 
results. 

Key words and phrases: Nonparametric density estimation, shape con- 
straint, log-concave density, Polya frequency function, strongly uni- 
modal, iterative convex minorant algorithm, active set algorithm. 
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1. INTRODUCTION 

There has been considerable recent activity in the 
area of inference under shape constraints, that is, 
inference about a (say) function / under the con- 
straint that / satisfies certain qualitative properties, 
such as monotonicity or convexity on certain sub- 
sets of its domain. This approach is appealing for 
two main reasons: First, such shape constraints are 
sometimes direct consequences of the problem un- 
der investigation (see, e.g., Hampel, 1987, or Wang 
et al., 2005), or they are at least plausible in many 
problems. It is then desirable that the result of the 
inference reflect this fact. There is also the hope that 
imposing these constraints will improve the quality 
of the resulting estimator in some sense. The sec- 
ond reason is that alternative nonparametric esti- 
mators such as, for example, kernel estimators, typ- 
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ically require the choice of a tuning parameter such 
as a bandwidth. A good choice for such a tuning 
parameter is usually far from trivial and injects a 
certain amount of subjectivity into the estimator. 
In contrast, inference under shape constraints often 
results in an explicit solution that does not depend 
on a tuning parameter. 

In the context of density estimation, Grenander 
(1956) derived the nonparametric maximum likeli- 
hood estimator of a density function that is nonin- 
creasing on a half-line. This estimator is given ex- 
plicitly by the left derivative of the least concave ma- 
jorant of the empirical distribution function. How- 
ever, this result does not carry over to the prob- 
lem of estimating a unimodal density with unknown 
mode, as then the nonparametric MLE does not 
exist; see, for example, Birge (1997). Even if the 
mode is known, the estimator suffers from incon- 
sistency near the mode, the so-called spiking prob- 
lem; see, for example, Woodroofe and Sun (1993). 
These results are unfortunate since the constraint 
of unimodality is cited as a reasonable assumption 
in many problems. 

It was argued in Walther (2002) that log-concave 
densities are an attractive and natural alternative 
choice to the class of unimodal densities: The class 
of log-concave densities is a subset of the class of 
the unimodal densities, but it contains most of the 
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commonly used parametric distributions and is thus 
a rich and useful nonparametric model. Moreover, it 
was shown in Walther (2002) that the nonparamet- 
ric MLE of a univariate log-concave density exists 
and can be computed with readily available algo- 
rithms. 

Due to these attractive properties, there has been 
considerable recent research activity about the sta- 
tistical properties of the MLE, computational as- 
pects, applications in modeling and inference, as well 
as about the multivariate case. As an example, Fig- 
ure 1 shows a scatterplot of measurements on 569 
individuals from the Wisconsin breast cancer data 
set; see Section 6 for a more detailed description. 
The data were clustered using a two-component nor- 
mal mixture model fitted with the EM-algorithm; 
see, for example, Fraley and Raftery (2002). The con- 
tour lines of the fitted normal components are shown 
in the left plot, while the right plot shows the con- 
tour lines that obtain when the normal MLE is re- 
placed by the log-concave MLE in the EM algo- 
rithm. The log-concave MLE automatically adapts 
to the multivariate skewness of the data and results 
in a superior clustering: Each observation is either 
a benign or a malignant instance. These labels were 
not used for the fitting but can be employed to as- 
sess the quality of the clustering. The EM algorithm 
with the log-concave MLE resulted in 121 misclas- 
sified instances versus 144 for the Gaussian MLE. 

This article gives an overview of recent results 
about inference and modeling with the log-concave 
MLE. Section 2 gives some basic properties and ap- 
plications of log-concave distributions. Section 3 ad- 
dresses the MLE and its statistical properties. Com- 
putational aspects are surveyed in Section 4, while 



Section 5 describes recent advances in the multi- 
variate setting. Section 6 reviews applications of the 
log-concave MLE for various modeling and inference 
problems. Section 7 lists some open problems for fu- 
ture work. 

2. BASIC PROPERTIES AND APPLICATIONS 
OF LOG-CONCAVE FUNCTIONS 

A function / on R rf is log-concave if it is of the 
form 

(1) f{x) = expcf)(x), 

for some concave function (p : R rf — > [—00,00). A pri- 
me example is the normal density, where 4>(x) is 
a quadratic in x. Further, most common univari- 
ate parametric densities are log-concave, such as the 
normal family, all gamma densities with shape pa- 
rameter > 1, all Weibull densities with exponent 
> 1, all beta densities with both parameters > 1, 
the generalized Pareto and the logistic density; see, 
for example, Marshall and Olkin (1979). 

Log-concave functions have a number of proper- 
ties that are desirable for modeling: Marginal dis- 
tributions, convolutions and product measures of 
log-concave distributions are again log-concave; see, 
for example, Dharmadhikari and Joag-Dev (1988). 
Notably, the first two properties are not true for 
the class of unimodal densities. 1 Log-concave distri- 
butions may be skewed, and this flexibility is rel- 
evant in a number of applications; see, for exam- 
ple, Section 6. On the other hand, log-concave dis- 
tributions necessarily have subexponential tails and 
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nondecreasing hazard rates; see, for example, Karlin 
(1968) and Barlow and Proschan (1975). 

There are several alternative characterizations and 
designations for the class of univariate log-concave 
distributions: Ibragimov (1956) proved that these 
are precisely the distributions whose convolution with 
a unimodal distribution is always unimodal; thus, 
log-concave distributions are sometimes referred to 
as strongly unimodal. Log-concave densities are also 
precisely the Polya frequency functions of order 2, 
as well as precisely those densities / for which the 
location family fe(x) := f(x — 9) has monotone like- 
lihood ratio in x; see Karlin (1968). 

Log-concave distribution models have been found 
useful in economics (see, e.g., An, 1995, 1998; Bag- 
noli and Bergstrom, 2005 and Caplin and Nalebuff, 
1991), in reliability theory (see, e.g., Barlow and 
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Fig. 2. The histogram of n — 270 flow cytometry data (top 
(bottom left), and <f> n — log/„ (bottom right). 



Proschan, 1975) and in sampling and nonparametric 
Bayesian analysis (see, e.g., Gilks and Wild, 1992; 
Dellaportas and Smith, 1993 and Brooks, 1998). Re- 
cent advances in inference have led to fruitful appli- 
cations of log-concave distributions in other areas 
such as clustering, some of which will be discussed 
in Section 6. 

3. PROPERTIES OF THE NONPARAMETRIC 
MLE 

If X\ , . . . , X n are i.i.d. observations from a univari- 
ate log-concave density (1), then the nonparamet- 
ric MLE exists, is unique, and is of the form f n = 
exp<j) n , where <p n is continuous and piecewise linear 
on LX^^X^)] with the set of knots contained in 

{X 1 ,. . .,X n }, and 4> n = -co on R\ X (n) ]; see 
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left), the log-concave MLE f n (top right), the estimated c.d.f. 
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Walther (2002), Rufibach (2006) or Pal, Woodroofe 
and Meyer (2007). An example is plotted in Fig- 
ure 2. 

Consistency of f n with respect to the Hellinger 
metric was established in Pal, Woodroofe and Meyer 
(2007), while Diimbgen and Rufibach (2009) provide 
results on the uniform consistency on compact sub- 
sets of the interior of the support: If <fi belongs to a 
Holder class with exponent /3 S [1,2], then <j) n and 
f n are uniformly consistent with rate 
Op((\ogn/n) /3 ^ 2l3+1 ^). Thus, in the typical case (3 = 
2, f n converges uniformly with rate O p ((logn/n) 2 / 5 ). 
It is known that these rates are optimal even if (3 
were known. This establishes that the nonparamet- 
ric MLE adapts to the unknown local smoothness of 
/, at least for /3 S [1,2]. Further, under some regular- 
ity conditions, the c.d.f. F n of f n is asymptotically 
equivalent to the empirical c.d.f. F n : If (3 > 1, then 
|F n — F n \ is of order o p (n -1 / 2 ) uniformly over com- 
pact subsets of the interior of the support. Moreover, 
F n — n _1 < F n < F n on the set of knots of 4> n . The re- 
sulting uniform -^/n-consistency of F n outperforms, 
for example, c.d.f.s of kernel estimators using a non- 
negative kernel with optimally chosen bandwidth. 
While empirical evidence suggests that f n performs 
well over the whole line, establishing the correspond- 
ing theoretical results is still an open problem. 

Balabdaoui, Rufibach and Wellner (2009) 
derive the pointwise limiting distributions of 

n k/(2k+l)0 n ( Xo) _ /(a . o))) „(*-l)/(2*+D(/;( X0 ) - 

f'(xo)), and likewise for <p n and (f/ n , where k is the 
smallest integer such that 4>( k \x$) ^ 0. They show 
that these limiting distributions depend on the "lower 
invelope" of an integrated Brownian motion process 
minus a drift term that depends on k. 

4. COMPUTATIONAL ASPECTS 

Maximizing the log-likelihood function under the 
constraint f exp(j)(x)dx = 1 is equivalent to maxi- 
mizing Y17=i 4>(Xi) —n J expcj)(x) dx over the set of 
all concave functions 0; see Silverman (1982). Due 
to the piecewise linear form of the solution 0, one 
can write this clS cl finite-dimensional optimization 
problem as follows: For the ordered data x\ < ■ ■ ■ < 
x n write 4>\ := <t>{x\) and denote the slope between 
Xi-i and Xi by Sj := ((j>(xi) - 4>{xi-\)) / {xi - Xj-i), 
i = 2,...,n. Then the optimization problem is to 
maximize 

^n(<Pl,S2, ■ ■ ■ ,S n ) 



n 

= n(f)i + ) X n - « + l)(aJi - Xi-i)Si 

i=2 

-nexp(0i)^l expl ^{x k - x k _i)s k J 

i=2 V \fc=2 / 

- exp {^YjyX k - x fc _i)s fc ^ J I Si 

under the constraint that the vector {*tp\, S2, ■ ■ ■ , s n ) 
belongs to the cone C n := {y € R n : yi > ■ ■ • > y n }. 
ty n is a concave function on R n which needs to be 
maximized over the convex cone C n . This is precisely 
the type of problem for which the Iterative Con- 
vex Minorant Algorithm (ICMA) was developed; see 
Groeneboom and Wellner (1992) and Jongbloed 
(1998). The key idea of that algorithm is to approx- 
imate the concave function locally around the cur- 
rent candidate solution by a quadratic form, which 
is then maximized by a Newton procedure over the 
cone by using the pool- adjacent-violators algorithm. 
This procedure is then iterated to the final solution. 
Walther (2002), Pal, Woodroofe and Meyer (2007) 
and Rufibach (2007) successfully employ the ICMA 
for this problem. The last reference gives a very de- 
tailed description of the algorithm and also com- 
pares the ICMA to several other algorithms that can 
be used for this problem, such as an interior point 
method; see, for example, Terlaky and Vial (1998). 
The ICMA shows a clearly superior performance in 
these simulation studies. Recently, Diimbgen, Hiisler 
and Rufibach (2007) have computed the log-concave 
MLE with an active set algorithm; see, for exam- 
ple, Fletcher (1987). Active set algorithms have the 
attractive property that they find the solution in 
finitely many steps, while the iterations of the ICMA 
have to be terminated by a stopping criterion. It 
appears that the active set algorithm provides the 
most efficient method for computing the MLE to 
date. Both the ICMA and the active set algorithm 
for computing the log-concave MLE are available 
with the R package "logcondens," which is acces- 
sible from "CRAN." An alternative way to compute 
the MLE with convex programming algorithms is 
described in Koenker and Mizera (2008). 

Another advantage of the log-concave MLE f n 
is that sampling from f n is quite straightforward: 
First, compute the c.d.f. F n at the ordered sam- 
ple x\,...,x n by integrating the piecewise exponen- 
tial function f n . Next, generate a random index J E 
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{2, . . . , n} with P( J = j) = F n ( Xj )- FnOj-i). Then 
generate U ~ U[0, 1] and set G := 4> n {x j) — 4> n {x 
If 6 / 0, set V := log(l + (exp(G) - 1)U)/S, other- 
wise set V := C7. Then X := + (xj — xj^\)V 
has density / n . 

5. THE MULTIVARIATE CASE 

The definition of a log-concave density does not 
depend on the underlying dimension; see (1). The fact 
that the MLE does not require the choice of a tuning 
parameter makes its use even more attractive in a 
multivariate setting, where, for example, a kernel es- 
timator requires the difficult choice of a bandwidth 
matrix. The structure of the multivariate MLE is 
analogous to the univariate case; see, for example, 
Cule, Samworth and Stewart (2008): The support of 
the MLE is the convex hull of the data, and there is 
a triangulation of this convex hull such that log/ n 
is linear on each simplex of the triangulation. Fig- 
ure 3 depicts an example for two-dimensional data. 
The multivariate MLE has already shown promise 
in a number of applications; see Section 6. 

The computation of the MLE requires an approach 
that is different from the univariate setting, as the 
multivariate piecewise linear structure of log f n does 
not allow to write this optimization problem in terms 
of a simple ordering of the slopes. Cule, Samworth 
and Stewart (2008) show how the MLE can be com- 
puted by solving a nondifferentiable convex opti- 
mization problem using Shor's r-algorithm; see Kap- 
pel and Kuntsevich (2000). Cule, Samworth and Stew- 
art (2008) report a robust and accurate performance 
of this algorithm, which they implemented in the R 
package LogConcDEAD; see Cule, Gramacy and Sam- 
worth (2009). However, the computation time in- 
creases quickly with sample size and dimension. Cule, 
Samworth and Stewart (2008) report computation 
times of about 1 sec for n = 100 observations in two 
dimensions, to 37 min for a sample of size n = 1000 
in four dimensions. It is therefore desirable to de- 
velop faster algorithms for this problem. 

Cule, Samworth and Stewart (2008) investigate 
the finite sample performance of the multivariate 
MLE via a simulation study. They compare the mean 
integrated squared error of the MLE with that of a 
kernel estimator with Gaussian kernel and a band- 
width that is either chosen to minimize the mean in- 
tegrated squared error (using knowledge about the 
density that would not be available in practice) or 
determined by an empirical bandwidth selector based 



on least squares cross validation. The MLE outper- 
forms both of these estimators except for small sam- 
ple sizes, and the improvement can be quite dra- 
matic. On the other hand, in view of the work of 
Birge and Massart (1993), it seems unlikely that the 
MLE will achieve optimal rates of convergence in 
dimensions d > 4, due to the richness of the class 
of concave functions. It would thus be helpful to 
have theoretical results about the performance of 
the multivariate MLE. Deriving such results is an 
open problem. 

6. APPLICATIONS IN MODELING AND 
INFERENCE 

One of the most fruitful applications of log-concave 
distributions has been in the area of clustering. A 
principled and successful approach to assign the ob- 
servations to clusters is via the mixture model f(x) = 
^2 m= i^mfm(x), where the mixture proportions n m 
are nonnegative and sum to unity, and the compo- 
nent distributions f m model the conditional density 
of the data in the mth cluster; see, for example, 
McLachlan and Peel (2000). Typically one assumes 
a parametric formulation f m (x) = f(9 m ,x) for the 
component distributions, such as the normal model; 
see, for example, Fraley and Raftery (2002). Then 
the EM algorithm provides an elegant solution to fit 
the above mixture model and to assign the data to 
one of the k components: The EM algorithm itera- 
tively assigns the data based on the current maxi- 
mum likelihood estimates of the component distri- 
butions, and then updates those estimates 7r m ,# m 
based on these assignments. An important advan- 
tage of using a mixture model for clustering is that 
it provides not only an assignment of the data to the 
k components, but also a measure of uncertainty for 
this assignment via the posterior probabilities that 
the ith observation belongs to the mth component: 

A disadvantage of this approach is that it depends 
on the parametric formulation in several important 
ways: If the parametric model is misspecified, then 
the accuracy of the clustering may deteriorate and 
the measure of uncertainty may be considerably off. 
For some data, such as those in Figure 2, no ap- 
propriate parametric model may be available. An- 
other disadvantage is that each parametric model 
requires a different implementation of the EM algo- 
rithm based on certain theoretical derivations; see, 
for example, McLachlan and Krishnan (1997). 
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Fig. 3. The MLE f n (left) and 4>„ =log/„ (right) for n = 1000 observations (plotted as dots) from a standard bivariate 
normal distribution. The plots are from Cule, Samworth and Stewart (2008). 



Therefore, it is desirable to have an EM-type clus- 
tering algorithm with nonparametric component dis- 
tributions. This would allow for a universal software 
implementation with flexible component distribu- 
tions. As was expounded in Sections 1 and 2, the 
class of log-concave distributions provides a flexible 
model, and, moreover, the MLE exists. Thus, one 
may attempt to mimic the EM-type clustering algo- 
rithm that works so well in the parametric context. 
This idea was successfully carried out in Chang and 
Walther (2007) and in Cule, Samworth and Stew- 
art (2008). In related work, Eilers and Borgdorff 
(2007) use a nonparametric smoother in place of 
the log-concave MLE in the M-step, with a penalty 



term that moves the estimate toward a log-concave 
function. Chang and Walther (2007) report a clear 
improvement compared to the parametric EM algo- 
rithm when the parametric model is not correct, and 
a performance that is almost similar to the Gaus- 
sian EM algorithm in the case where the Gaussian 
model is correct. Thus, the use of log-concave com- 
ponent distributions provides a flexible methodology 
for clustering, and this flexibility does not entail any 
noticeable penalty in the special case where a para- 
metric model is appropriate. 

Chang and Walther (2007) also consider a multi- 
variate extension by modeling each component dis- 
tribution with log-concave marginals and a normal 





Fig. 4. Contours of the estimated model obtained from the log-concave EM algorithm of Chang and Walther (2007) (left) and 
from the Gaussian EM algorithm (right) based on the plotted observations. The underlying distribution has a skewed (shifted 
gamma) distribution in the y-direction of the top component. The plots are from Chang and Walther (2007). 
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copula for the dependence structure. This simple 
multivariate extension avoids the more challenging 
task of estimating a multivariate log-concave den- 
sity, but it is flexible enough for many situations. 
Figure 4 compares the fitted components with those 
for the Gaussian model for simulated bivariate data. 
The log-concave model automatically picks up the 
skewness in the y-direction and results in a notice- 
ably improved error rate for the clustering; see Chang 
and Walther (2007) for details. 

Cule, Samworth and Stewart (2008) extend this 
approach by using the multivariate log-concave MLE 
for each component. They apply the log-concave EM 
algorithm to the Wisconsin breast cancer data of 
Street et al. (1993) and obtain only 121 misclassi- 
fied instances compared to 144 with the Gaussian 
EM algorithm. Figure 5 shows a scatterplot of the 
data and the fitted log-concave mixture. The con- 
tour plots of the fitted components from the Gaus- 
sian EM algorithm and the log-concave EM algo- 
rithm are given in Figure 1. 

Developing principled methodology for selecting 
an appropriate number of components is an open 
problem. Methodology for testing for the presence of 
mixing in the log-concave model is given by Walther 
(2001) and Walther (2002), where the latter ap- 
proach uses the fact that a log-concave mixture al- 
lows the representation exp(^(x) + c||x|| 2 ) for some 
c > and a concave function cp. 

While log-concave distributions allow for flexible 
modeling, the structure provided by a log-concave 
estimator has turned out to result in advantageous 
properties in a number of other inference problems: 

Diimbgen and Rufibach (2009) use the fact that 
the hazard rate of a log-concave density is automat- 
ically monotone and construct a simple plug-in es- 
timator of the hazard rate which is nondecr easing. 
Rates of convergence for f n automatically translate 
to rates for the hazard rate estimator. 

Miiller and Rufibach (2009) report an improved 
performance for certain problems in extreme value 
theory when employing a log-concave estimator. 

Diimbgen, Hiisler and Rufibach (2007) show how 
the assumption of log-concavity allows the estima- 
tion of a distribution based on arbitrarily censored 
data using the EM algorithm. They replace the log- 
likelihood function by a function that is linear in cp. 
This function can be interpreted as the conditional 
expectation of the log-likelihood function given the 
available data and represents the E-step in the EM 
algorithm. The M-step consists of maximizing this 



function using the active set algorithm described in 
Section 4. 

Balabdaoui, Rufibach and Wellner (2009) investi- 
gate the mode of /„ as an estimator of the mode 
of /. Estimation of the mode of a unimodal density 
has received considerable attention in the literature. 
Typically, some choice of bandwidth or tuning pa- 
rameter is required due to the problems with the 
MLE of a univariate density described in Section 1. 
The MLE of a log-concave density does not suffer 
from this problem and provides an estimate of the 
mode by-product. Balabdaoui, Rufibach and 
Wellner (2009) establish the limiting distribution of 
this estimator and show that the estimator is opti- 
mal in the asymptotic minimax sense. 

7. SUMMARY AND FUTURE WORK 

Log-concave distributions constitute a flexible non- 
parametric class which allows modeling and infer- 
ence without a tuning parameter. The MLE has fa- 
vorable theoretical performance properties and can 
be computed with available algorithms. These ad- 
vantageous properties have resulted in tangible im- 
provements in a number of relevant problems, such 
as in clustering and when handling censored data. 

As for future work, there is clearly the potential 
for similar improvements in a host of other prob- 
lems, such as regression (see, e.g., Eilers, 2005) or 
Cox regression under shape constraints on the haz- 
ard rate. Further, it would be useful to study the 
consequences of model misspecification. For exam- 
ple, the mode of the log-concave MLE is a useful tool 
for data analysis. It would thus be interesting to in- 
vestigate how far off this mode can be from the pop- 
ulation mode in the case where the population dis- 
tribution is unimodal but not log-concave. The out- 
standing performance of the multivariate MLE re- 
ported in the simulation studies in Cule, Samworth 
and Stewart (2008) lends importance to a theoretical 
investigation of its convergence properties. Finally, 
it would be desirable to develop faster algorithms 
for computing the multivariate MLE. 

For modeling with heavier, algebraic tails, it may 
be of interest to consider the more general class of 
p-concave densities; see Avriel (1972), Borell (1975) 
and Dharmadhikari and Joag-Dev (1988). First re- 
sults about nonparametric estimation and computa- 
tional issues in this class were obtained in Koenker 
and Mizera (2008) and Seregin (2008). 
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Fig. 5. The Wisconsin breast cancer data (top), with benign cases as open circles and malignant cases as crosses. The bottom 
plot shows the fitted mixture distribution from the log-concave EM algorithm. The plots are from Cule, Samworth and Stewart 
(2008). 
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